Mining the WHO Drug Safety Database Using Lasso Logistic Regression
نویسنده
چکیده
For reasons such as low incidence, occurrence in groups frequently excluded from clinical trials and long onset times, some adverse drug reactions (ADRs) of a new medicinal product stay unnoticed until after market launch. The World Health Organization (WHO) in collaboration with the Uppsala Monitoring Centre (UMC) continuously collect spontaneous ADR reports from the entire world and use data mining approaches to detect which drugs are most likely to cause which previously unanticipated ADRs. This WHO drug safety database, being the largest of its kind, comprises about 3.8 million accumulated reports. The currently used data mining methods are based on two-dimensional projections of the data with respect to a given drug-ADR combination. This combination is then given an association score based on the discrepancy between the observed and expected number of reports on it. In this thesis these disproportionality-based methods are represented by the information component (IC) measure of the UMC, a shrunk Bayesian measure. A limitation with the IC is its incapability to deal with confounding by co-medication and masking. Confounding by co-medication means that the association between a drug and a certain ADR might seem stronger than it really is because that drug is used together with another drug, which in turn is truly associated with the ADR. Masking, on the other hand, is a phenomenon whereby a very strong association between an ADR and some drug might weaken the associations between that ADR and other drugs. Here a novel method to mine the WHO drug safety database is proposed to address these issues, the lasso logistic regression (LLR). Instead of studying each combination separately, in the LLR model the ADR under study is fixed and its presence on a report is predicted by the presence of all occurring drugs in the database, thus yielding a logistic regression framework. Further, independent prior Laplace distributions are put on the parameters, resulting in a lasso-type shrinkage where a subset of the parameters are shrunk to exactly zero. The LLR was confirmed to correct for confounding by co-medication and masking in simulated scenarios and specific clinical examples. Further, with a specific degree of shrinkage the LLR had 10 % higher recall and maintained precision in comparison to the IC with respect to a test database. Although its transparency is limited, the LLR has an important role to play in the future of ADR monitoring.
منابع مشابه
Penalized Lasso Methods in Health Data: application to trauma and influenza data of Kerman
Background: Two main issues that challenge model building are number of Events Per Variable and multicollinearity among exploratory variables. Our aim is to review statistical methods that tackle these issues with emphasize on penalized Lasso regression model. The present study aimed to explain problems of traditional regressions due to small sample size and m...
متن کاملDetection of independent associations in a large epidemiologic dataset: a comparison of random forests, boosted regression trees, conventional and penalized logistic regression for identifying independent factors associated with H1N1pdm influenza infections
BACKGROUND Big data is steadily growing in epidemiology. We explored the performances of methods dedicated to big data analysis for detecting independent associations between exposures and a health outcome. METHODS We searched for associations between 303 covariates and influenza infection in 498 subjects (14% infected) sampled from a dedicated cohort. Independent associations were detected u...
متن کاملExtraction of Drug Crime Patterns and Identifying People at Risk Using Data Mining Techniques
Introduction: In recent years, technology advancement and the growth of information technology in organizations have provided a huge source of data stored in the field of drug-related offenses. Analyzing these data and discovering hidden patterns in it can help detect and prevent the occurrence of crimes in this area. This paper aimed to identify the susceptible people to drug trafficking in Si...
متن کاملExtraction of Drug Crime Patterns and Identifying People at Risk Using Data Mining Techniques
Introduction: In recent years, technology advancement and the growth of information technology in organizations have provided a huge source of data stored in the field of drug-related offenses. Analyzing these data and discovering hidden patterns in it can help detect and prevent the occurrence of crimes in this area. This paper aimed to identify the susceptible people to drug trafficking in Si...
متن کاملText Mining For Information Systems Researchers: An Annotated Topic Modeling Tutorial
Analysts have estimated that more than 80 percent of today’s data is stored in unstructured form (e.g., text, audio, image, video)—much of it expressed in rich and ambiguous natural language. Traditionally, to analyze natural language, one has used qualitative data-analysis approaches, such as manual coding. Yet, the size of text data sets obtained from the Internet makes manual analysis virtua...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007